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Program Description 1 

Head Start is a national, federally funded program that provides 
services to promote school readiness for children from birth to age 5 
from predominantly low-income families. 2 These services are pro- 
vided to both children and their families and include education, health 
and nutrition, family engagement, and other social services. 

Head Start program administrators are given the flexibility to design 
service delivery to be responsive to cultural, linguistic, and other con- 
textual needs of local communities, leading to considerable variability 
in the services offered. Head Start service models also vary according 
to family needs, such that children and families may be served through 
center-based or family child care, home visits, or a combination of 
programs that operate full or half days for 8-12 months per year. 3 
This review focuses on the effects of Head Start programs designed 
for children ages 3-5. The Head Start programs include a variety of 
Head Start service models. 

Research 4 

The What Works Clearinghouse (WWC) identified one study of Head 
Start that both falls within the scope of the Early Childhood Educa- 
tion topic area and meets WWC group design standards. 5 The study 
meets WWC group design standards without reservations, and no 
studies meet WWC group design standards with reservations. The 
study included 3,697 three- and four-year-old children in a nationally-representative sample. 

The WWC considers the extent of evidence for Head Start on the school readiness outcomes of 3- and 4-year-old 
children to be small for three outcome domains— general reading achievement, mathematics achievement, and 
social-emotional development. There were no studies that meet standards in the five other domains, so this inter- 
vention report does not report on the effectiveness of Head Start for those domains. (See the Effectiveness Sum- 
mary on p. 5 for more details of effectiveness by domain.) 

Effectiveness 

Head Start was found to have potentially positive effects on general reading achievement and no discernible effects 
on mathematics achievement and social-emotional development for 3- and 4-year-old children. 
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Table 1. Summary of findings 6 


Improvement index (percentile points) 


Outcome domain 

Rating of effectiveness 

Average 

Range 

Number of 
studies 

Number of 
students 

Extent of 
evidence 

General reading 
achievement 

Potentially positive effects 

+13 

+12 to +14 

1 

3,697 

Small 

Mathematics 

achievement 

No discernible effects 

+3 

na 

1 

1,617 

Smal 

Social-emotional 

development 

No discernible effects 

+1 

-1 to +5 

1 

3,693 

Small 


na = not applicable 
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Program Information 

Background 

Head Start was first launched as an 8-week demonstration project in the summers of 1965 and 1966. Since then, 
Head Start has served more than 30 million children and grown to include a multitude of program options. Head 
Start was most recently reauthorized in 2007. Currently, the program is administered by the Administration for 
Children and Families in the U.S. Department of Health and Human Services. Address: Office of Head Start, 1250 
Maryland Ave., SW, Washington, DC 20024. Email: HeadStart@eclkc.info. Web: http://www.acf.hhs.gov/programs/ 
ohs. Telephone: (866) 763-6481. 

Program details 

Head Start is a national, federally funded program for preschool children from low-income families. Head Start’s 
main purpose is to prepare children for school. The program seeks to promote school readiness by bolstering a 
child’s development and learning. Services focus on language and literacy skills, cognition and general knowledge, 
physical development and health, social and emotional development, and approaches to learning. Programs may 
be based in centers or schools, family child care homes, and children’s own homes (home visits). 

Head Start services are designed to be responsive to each child and family’s ethnic, cultural, and linguistic heritage. 
Children who attend Head Start participate in a variety of educational activities. They also receive free medical, 
dental, and mental health screenings. They are provided with healthy meals and snacks, as well as opportunities to 
play indoors and outdoors in a safe setting. In addition, Head Start programs work with children’s families to assist 
with accessing regular health care and community resources for low-income families, and helping them actively 
engage in their children’s development and early learning. 

Head Start programs work directly with local agencies and can differ based on community needs. The Office of 
Head Start (http://www.acf.hhs.gov/programs/ohs) is responsible for oversight of grantees, quality assurance, and 
technical assistance for program staff in their delivery of services to children and families. 


Cost 

The cost of implementing Head Start is available from the Office of Head Start. As a rule, local grantees must 
provide a 20% cash or in-kind match to federal funds. No more than 15% of total program costs may be used for 
program administration. In addition, some localities offer funding to expand Head Start to additional children within 
their jurisdiction. 
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Research Summary 

The WWC identified 40 eligible studies that investigated the effects 
of Head Start on the school readiness of preschool-aged children. An 
additional 50 studies were identified but do not meet WWC eligibility 
criteria for review in this topic area. Citations for all 90 studies are in the 
References section, which begins on p. 7. 

The WWC reviewed 40 eligible studies against group design standards. 

One study (U.S. Department of Health and Human Services, Adminis- 
tration for Children and Families [DHHS ACF], 2010) is a randomized controlled trial that meets WWC group design 
standards without reservations. The study is summarized in this report. Thirty-nine studies do not meet WWC 
design standards. 

Summary of study meeting WWC group design standards without reservations 

DHHS ACF (2010) reports on the impact of Head Start on the development of preschool children’s school readi- 
ness skills. The authors of this study created a nationally-representative sample by first randomly identifying 84 
grantee or delegate agencies 7 and 378 Head Start centers in 23 states to conduct the experiment. 8 Children in the 
comparison group experienced diverse types of early care and education settings, ranging from parent-only care to 
programs that were similar in type and services to more typical Head Start programs. 

The study design and presentation of impacts focused on two cohorts of children. 

Three-year-old cohort: The DHHS ACF (2010) study included 3-year-old children whose families were applying 
to the selected Head Start programs for the first time. These children were then randomly assigned either to be 
offered Head Start (1 ,278 children) or to be in the comparison group (784 children). 9 The authors reported that 
17.3% of those assigned to the comparison group actually enrolled in a Head Start program that was not selected 
for inclusion in the study. 10 

Four-year-old cohort: The DHHS ACF (2010) study included 4-year-old children whose families were applying to the 
selected Head Start programs for the first time. These children were then randomly assigned either to be offered 
Head Start (1 ,008 children) or to be in the comparison group (627 children). 11 The authors reported that 13.9% of 
those assigned to the comparison group actually enrolled in a Head Start program that was not selected for inclu- 
sion in the study. 12 

The study findings and summary of effectiveness presented in the main body of this report are based on total 
score outcomes that were measured at the end of the children’s first year in Head Start (2003) and represent the 
immediate effects of Head Start , 13 Subtest outcomes that meet WWC group design standards for this first year 
are presented as supplemental outcomes in Appendix D. 14 Following the prioritization of immediate outcomes as 
indicated in the Early Childhood Education review protocol (version 3.0), findings associated with the 4-year-old 
cohort outcome measures collected in spring 2004, 2005, and 2007 are also presented as supplemental outcomes 
in Appendix D, since they represent intermediate to longer-term follow-up effects of Head Start at kindergarten, first 
grade, and third grade, respectively. 

Summary of studies meeting WWC group design standards with reservations 

No studies of Head Start met WWC group design standards with reservations. 


Table 2. Scope of reviewed research 


Grades 

PK 

Delivery method 

Individual, Small group, 
Whole class, 

Whole school 

Program type 

School level 
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Effectiveness Summary 

The WWC review of Head Start for the Early Childhood Education topic area includes student outcomes in eight 
domains: alphabetics, cognition, comprehension, fluency, general reading achievement, language development, 
mathematics achievement, and social-emotional development. The one study of Head Start that meets WWC group 
design standards without reservations reported findings in three of the eight domains: (a) general reading achieve- 
ment, (b) mathematics achievement, and (c) social-emotional development. The findings below present both the 
authors’ estimates and WWC-calculated estimates of the size and statistical significance of the effects of Head 
Start on 3- and 4-year-old children. Additional comparisons are presented as supplemental findings in Appendix D. 
The supplemental findings do not factor into the intervention’s ratings of effectiveness. For a more detailed descrip- 
tion of the rating of effectiveness and extent of evidence criteria, see the WWC Rating Criteria on p. 34. 

Summary of effectiveness for the general reading achievement domain 

One study that meets WWC group design standards without reservations reported findings in the general reading 
achievement domain. 

The DHHS ACF (2010) study investigated one outcome in the general reading achievement domain that meets WWC 
group design standards without reservations: The Parent Emergent Literacy Scale (PELS). The PELS measures 
children’s literacy skills in five areas using parent ratings, which include letter recognition, counting, name writing, 
and primary color identification. The authors reported, and the WWC confirmed, a statistically significant and posi- 
tive effect of Head Start on children’s PELS ratings for both the 3-year-old and 4-year-old cohorts at the end of the 
intervention year. The WWC characterizes these study findings as a statistically significant positive effect. 

Thus, for the general reading achievement domain, one study showed statistically significant positive effects, and 
no studies showed an indeterminate effect or a statistically significant or substantively important negative effect. 
This results in a rating of potentially positive effects, with a small extent of evidence. 


Table 3.1 Rating of effectiveness and extent of evidence for the general reading achievement domain 


Rating of effectiveness 

Criteria met 

Potentially positive effects 

Evidence of a positive 
effect with no overriding 
contrary evidence. 

In the one study that reported findings, the estimated impact of the intervention on outcomes in the general read- 
ing achievement domain was positive and statistically significant. 

Extent of evidence 

Criteria met 

Small 

One study that included 3,697 3- and 4-year-old children in a nationally-representative sample reported evidence 
of effectiveness in the general reading achievement domain. 


Summary of effectiveness for the mathematics achievement domain 

One study that meets WWC group design standards without reservations reported findings in the mathematics 
achievement domain. 

The DHHS ACF (2010) study reported one outcome in the mathematics achievement domain that meets WWC 
group design standards without reservations: the Counting Bears Test. The Counting Bears Test measures count- 
ing ability and understanding of one-to-one correspondence. Impacts for this outcome were only presented for 
the 4-year-old cohort. The authors reported, and the WWC confirmed, no statistically significant or substantively 
important difference between the Head Start and comparison groups on this measure. The WWC characterizes this 
study finding as an indeterminate effect. 
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Thus, for the mathematics achievement domain, one study showed an indeterminate effect, and no studies showed 
a statistically significant or substantively important positive or negative effect. This results in a rating of no discern- 
ible effects, with a small extent of evidence. 


Table 3.2 Rating of effectiveness and extent of evidence for the mathematics achievement domain 


Rating of effectiveness 

Criteria met 

No discernible effects 

No affirmative evidence of effects. 

In the one study that reported findings, the estimated impact of the intervention on outcomes in the mathematics 
achievement domain was neither statistically significant nor large enough to be substantively important. 

Extent of evidence 

Criteria met 

Small 

One study that included 1,617 4-year-old children in a nationally-representative sample reported evidence of 
effectiveness in the mathematics achievement domain. 


Summary of effectiveness for the social-emotional development domain 

One study that meets WWC group design standards without reservations reported findings in the social-emotional 
development domain. 

The DHHS ACF (2010) study reported three outcomes in the social-emotional development domain that meet WWC 
group design standards without reservations: the Total Problem Behavior Scale, 15 the Social Competencies Check- 
list, and the Social Skills and Positive Approaches to Learning Scale. The authors reported, and the WWC confirmed, 
no statistically significant or substantively important effects on any of these measures for either the 3-year-old or the 
4-year-old cohorts. The WWC characterizes these study findings as having an indeterminate effect. 16 

Thus, for the social-emotional development domain, one study showed an indeterminate effect, and no studies 
showed a statistically significant or substantively important positive or negative effect. This results in a rating of no 
discernible effects, with a small extent of evidence. 


Table 3.3 Rating of effectiveness and extent of evidence for the social-emotional development domain 


Rating of effectiveness 

Criteria met 

No discernible effects 

No affirmative evidence of effects. 

In the one study that reported findings, the estimated impact of the intervention on outcomes in the social- 
emotional development domain was neither statistically significant nor large enough to be substantively important. 

Extent of evidence 

Criteria met 

Small 

One study that included 3,693 3- and 4-year-old children in a nationally-representative sample reported evidence 
of effectiveness in the social-emotional development domain. 
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cannot be attributed solely to the intervention— there was only one unit assigned to one or both conditions. 
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Studies that are ineligible for review using the Early Childhood Education Evidence Review Protocol 

Anderson, K., Foster, J., & Frisvold, D. (2010). Investing in health: The long-term impact of Head Start on smoking. 
Economic Inquiry, 48(3), 587-602. The study is ineligible for review because it does not include an outcome 
within a domain specified in the protocol. 

Anderson, L. M., Shinn, C., Fullilove, M. T., Scrimshaw, S. C., Fielding, J. E., Normand, J., ... U.S. Task Force on 
Community Preventive Services. (2003). The effectiveness of early childhood development programs: A sys- 
tematic review. American Journal of Preventive Medicine, 24(Suppl3), 32-46. The study is ineligible for review 
because it is a secondary analysis of the effectiveness of an intervention, such as a meta-analysis or research 
literature review. 

Barnett, W. S., & Hustedt, J. T. (2005). Head Start’s lasting benefits. Infants & Young Children: An Interdisciplinary 
Journal of Special Care Practices, 78(1), 16-24. The study is ineligible for review because it is a secondary 
analysis of the effectiveness of an intervention, such as a meta-analysis or research literature review. 

Bernstein, S. (2013). Child care choice: Parental processes and consequences for research (Doctoral dissertation). 
Available from ProQuest Dissertations and Theses database. (UMI No. 3527516) The study is ineligible for 
review because it is a secondary analysis of the effectiveness of an intervention, such as a meta-analysis or 
research literature review. 

Carneiro, R, & Ginja, R. (2014). Long-term impacts of compensatory preschool on health and behavior: Evidence 
from Head Start. American Economic Journal: Economic Policy 6(4), 135-173. The study is ineligible for review 
because it does not examine an intervention implemented in a way that falls within the scope of the review. 

Chambers, B., Cheung, A., Slavin, R. E., Smith, D., & Laurenzano, M. (2010). Effective early childhood education 
programs: A systematic review. Baltimore, MD: Johns Hopkins University, Center for Research and Reform in 
Education. The study is ineligible for review because it is a secondary analysis of the effectiveness of an inter- 
vention, such as a meta-analysis or research literature review. 

Cole, O. J., & Washington, V. (1986). A critical analysis of the assessment of the effects of Head Start on minority 
children. Journal of Negro Education, 55(1), 91-106. The study is ineligible for review because it is a secondary 
analysis of the effectiveness of an intervention, such as a meta-analysis or research literature review. 

Crahay, M. (1991). Childcare and preschool effects: A review of Anglo-Saxon evaluative studies related to com- 
pensatory education and preschool education. Liege, Belgium: University of Liege. The study is ineligible for 
review because it is a secondary analysis of the effectiveness of an intervention, such as a meta-analysis or 
research literature review. 

Educational Research Service. (1995). Head Start. Arlington, VA: Author. The study is ineligible for review because 
it is a secondary analysis of the effectiveness of an intervention, such as a meta-analysis or research literature 
review. 

Felix, C., & Frisvold, D. (2010). Health outcomes from Head Start participation. In D. Slottje & R. Tchernis (Eds.), 
Current issues in health economics: Contributions to economic analysis (pp. 1 1 5-1 38). Bradford, UK: Emerald 
Group Publishing. The study is ineligible for review because it does not include an outcome within a domain 
specified in the protocol. 

Gamble, T. J., & Zigler, E. (1989). The Head Start synthesis project: A critique. Journal of Applied Developmental 
Psychology, 10(2), 267-274. The study is ineligible for review because it is a secondary analysis of the effec- 
tiveness of an intervention, such as a meta-analysis or research literature review. 

Garces, E., Currie, J., & Thomas, D. (2002). Longer term effects of Head Start. The American Economic Review, 
92(4), 999-1012. The study is ineligible for review because it does not include an outcome within a domain 
specified in the protocol. 

Additional source: 

Garces, E., Thomas, D., & Currie, J. (2000). Longer term effects of Head Start (Unpublished manuscript). 

Department of Economics, University of California at Los Angeles. 
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Gibbs, C., Ludwig, J., & Miller, D. L. (201 1). Does Head Start do any lasting good? (NBER Working Paper 17452). 
Cambridge, MA: National Bureau of Economic Research. The study is ineligible for review because it is a sec- 
ondary analysis of the effectiveness of an intervention, such as a meta-analysis or research literature review. 

Gilliam, W. S., & Zigler, E. F. (2001). A critical meta-analysis of all evaluations of state-funded preschool from 1977 
to 1998: Implications for policy, service delivery and program evaluation. Early Childhood Research Quarterly, 
75(4), 441-473. The study is ineligible for review because it is a secondary analysis of the effectiveness of an 
intervention, such as a meta-analysis or research literature review. 

Gorey, K. M. (2001). Early childhood education: A meta-analytic affirmation of the short- and long-term benefits of 

educational opportunity. School Psychology Quarterly, 76(1), 9-30. The study is ineligible for review because it is a 
secondary analysis of the effectiveness of an intervention, such as a meta-analysis or research literature review. 

Greenfader, C. M., & Miller, E. B. (2014). The role of access to Head Start and quality ratings for Spanish-speaking 
dual language learners’ (DLLs) participation in early childhood education. Early Childhood Research Quarterly, 
29(3), 378-388. The study is ineligible for review because it does not use a comparison group design or a 
single-case design. 

Haas, L. E. (201 1). Formal and informal measures of reading and math achievement as a function of early child- 
hood program participation among kindergarten through eighth grade students (Doctoral dissertation). Avail- 
able from ProQuest Dissertations and Theses database. (UMI No. 3484769) The study is ineligible for review 
because it does not use a comparison group design or a single-case design. 

Hale, B., Seitz, V., & Zigler, E. (1990). Health services and Head Start: A forgotten formula. Journal of Applied 

Developmental Psychology, 7 7(4), 447-458. The study is ineligible for review because it does not include an 
outcome within a domain specified in the protocol. 

Herman, A. D., & Mayer, G. G. (2004). Reducing the use of emergency medical resources among Head Start fami- 
lies: A pilot study. Journal of Community Health: The Publication for Health Promotion and Disease Preven- 
tion, 29(3), 197-208. The study is ineligible for review because it does not use a comparison group design or a 
single-case design. 

Isaacs, J. B. (2008). Impacts of early childhood programs. Washington, DC: The Brookings Institution and First 

Focus. The study is ineligible for review because it is a secondary analysis of the effectiveness of an interven- 
tion, such as a meta-analysis or research literature review. 

Jessup, P. A. (2008). Learning research: Insights from Head Start. Journal of Early Childhood Research, 6(1), 51-57. 
The study is ineligible for review because it does not use a comparison group or single case design. 

Lee, V. E., & Loeb, S. (1995). Where do Head Start attendees end up? One reason why preschool effects fade out. 
Educational Evaluation and Policy Analysis, 77(1), 62-82. The study is ineligible for review because it does not 
include an outcome within a domain specified in the protocol. 

Li, W. (2014). Center-based early childhood education: Curriculum, implementation, and intensity (Doctoral disserta- 
tion). Available from ProQuest Dissertations and Theses database. (UMI No. 3564827) The study is ineligible 
for review because it does not use a comparison group design or a single-case design. 

Love, J. M., Chazan-Cohen, R., & Raikes, H. (2007). Forty years of research knowledge and use: From Head Start 
to Early Head Start and beyond. In J. L. Aber, S. J. Bishop-Josef, S. M. Jones, K. T. McLearn, & D. A. Phillips 
(Eds.), Child development and social policy: Knowledge for action (pp. 79-95). Washington, DC: American 
Psychological Association. The study is ineligible for review because it is a secondary analysis of the effective- 
ness of an intervention, such as a meta-analysis or research literature review. 

Love, J. M., Grover, J., & RMC Research Corp. (1987). Study of Head Start recruitment and enrollment: Final report. 
Washington, DC: U.S. Department of Health and Human Services, Administration for Children, Youth and 
Families. http://files.eric.ed.gov/fulltext/ED283607.pdf. The study is ineligible for review because it does not 
use a comparison group design or a single-case design. 
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Ludwig, J., & Phillips, D. (2008). The long-term effects of Head Start on low-income children. Annals of the New 
York Academy of Sciences, 40, 1-12. The study is ineligible for review because it is a secondary analysis of 
the effectiveness of an intervention, such as a meta-analysis or research literature review. 

Ludwig, J., & Phillips, D. A. (2007). The benefits and costs of Head Start (NBER Working Paper 12973). Cambridge, 
MA: National Bureau of Economic Research. http://files.eric.ed.gov/fulltext/ED521701.pdf. The study is 
ineligible for review because it is a secondary analysis of the effectiveness of an intervention, such as a meta- 
analysis or research literature review. 

Lumeng, J., Kaciroti, N., & Frisvold, D. (2010). Changes in body mass index Z score over the course of the aca- 
demic year among children attending Head Start. Academic Pediatrics, 10(3), 179-186. The study is ineligible 
for review because it does not include an outcome within a domain specified in the protocol. 

McGroder, S. M. (1990). Head Start: What do we know about what works. Washington, DC: U.S. Dept, of Health 
and Human Services. The study is ineligible for review because it is a secondary analysis of the effectiveness 
of an intervention, such as a meta-analysis or research literature review. 

McKey, R. H. (1985). The impact of Head Start on children, families and communities. Final report of the Head Start 
evaluation, synthesis and utilization project. Washington, DC: U.S. Department of Health and Human Services, 
Administration for Children, Youth and Families. http://files.eric.ed.gov/fulltext/ED263984.pdf. The study is 
ineligible for review because it is a secondary analysis of the effectiveness of an intervention, such as a meta- 
analysis or research literature review. 

Mervis, J. (2011). Giving children a Head Start is possible— but it’s not easy. Science, 333(6045), 956-957. The 
study is ineligible for review because it does not use a comparison group design or a single-case design. 

Nielsen, W. L. (1989). The longitudinal effects of project head start on students’ overall academic success: A 

review of the literature. International Journal of Early Childhood, 21(1), 35-42. The study is ineligible for review 
because it is a secondary analysis of the effectiveness of an intervention, such as a meta-analysis or research 
literature review. 

Research-based responses to key questions about the 2010 Head Start impact study. (2011). Child Trends: Early 
Childhood Highlights, 2(1). The study is ineligible for review because it is a secondary analysis of the effective- 
ness of an intervention, such as a meta-analysis or research literature review. 

Reynolds, A. J. (Ed.). (2010). Childhood programs and practices in the first decade of life: A human capital integra- 
tion. New York: Cambridge University Press. The study is ineligible for review because it is a secondary analy- 
sis of the effectiveness of an intervention, such as a meta-analysis or research literature review. 

Schweinhart, L. J. (2003). The three types of early childhood programs in the United States. In A. J. Reynolds (Ed.), 
Early childhood programs fora new century (pp. 241-254). Washington, DC: Child Welfare League of America, 
Inc. The study is ineligible for review because it is a secondary analysis of the effectiveness of an intervention, 
such as a meta-analysis or research literature review. 

Schweinhart, L. J., & ERIC Clearinghouse on Elementary and Early Childhood Education. (2001). Recent evidence 
on preschool programs. ERIC digest. Champaign IL: ERIC Clearinghouse on Elementary and Early Childhood 
Education. http://files.eric.ed.gov/fulltext/ED458046.pdf. The study is ineligible for review because it is a sec- 
ondary analysis of the effectiveness of an intervention, such as a meta-analysis or research literature review. 

Schweinhart, L. J., & Weikart, D. P. (1986). What do we know so far? A review of the Head Start synthesis project. 
Young Children, 41(2), 49-55. The study is ineligible for review because it is a secondary analysis of the effec- 
tiveness of an intervention, such as a meta-analysis or research literature review. 

Shager, H. M., Schindler, H. S., Magnuson, K. A., Duncan, G. J., Yoshikawa, H., & Hart, C. M. D. (2013). Can 

research design explain variation in Head Start research results? A meta-analysis of cognitive and achieve- 
ment outcomes. Educational Evaluation and Policy Analysis, 35(1), 76-95. The study is ineligible for review 
because it is a secondary analysis of the effectiveness of an intervention, such as a meta-analysis or research 
literature review. 
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Shimoni, R. (1 990). A historical overview of the development of early childhood services, http://files.eric.ed.gov/full- 
text/ED334000.pdf. The study is ineligible for review because it is a secondary analysis of the effectiveness of 
an intervention, such as a meta-analysis or research literature review. 

Sprigle, J. E., & Schaefer, L. (1985). Longitudinal evaluation of the effects of two compensatory preschool programs 
on fourth- through sixth-grade students. Developmental Psychology, 21(A), 702. The study is ineligible for 
review because it does not include an outcome within a domain specified in the protocol. 

Swadener, B. B., Dunlap, S. K., & Nespeca, S. M. (1995). Family literacy and social policy: Parent perspectives and 
policy implications. Reading & Writing Quarterly, 11(3), 267-283. The study is ineligible for review because it 
does not use a comparison group design or a single-case design. 

U.S. Department of Health and Human Services, Administration for Children, Youth and Families, Head Start 

Bureau. (1 993). Head Start: A child development program. Washington, DC: Author. The study is ineligible for 
review because it does not use a comparison group design or a single-case design. 

U.S. Department of Health and Human Services, Office of Inspector General. (1993). Evaluating Head Start expan- 
sion through performance indicators. OEI-09-91 -00762. Washington, DC: U.S. Department of Health and 
Human Services. The study is ineligible for review because it does not use a comparison group design or a 
single-case design. 

U.S. General Accounting Office (2000). Early childhood programs: Characteristics affect the availability of school 
readiness information. GAO/HEHS-OO-38. Washington, DC: Author. Retrieved from http://www.gao.gov/ 
assets/230/228763. pdf. The study is ineligible for review because it is a secondary analysis of the effective- 
ness of an intervention, such as a meta-analysis or research literature review. 

Whiteside-Mansell, L., Bradley, R., McKelvey, L., & Lopez, M. (2009). Center-based Early Head Start and children 
exposed to family conflict. Early Education and Development, 20(6), 942-957. The study is ineligible for review 
because it does not include an outcome within a domain specified in the protocol. 

Zeece, P. D., & Wang, A. (1998). Effects of the family empowerment and transitioning program on child and family 
outcomes. Child Study Journal, 28(3), 161-178. The study is ineligible for review because it does not examine 
an intervention implemented in a way that falls within the scope of the review— the intervention is bundled 
with other components. 

Zhai, F., Waldfogel, J., & Brooks-Gunn, J. (2013). Head Start, prekindergarten, and academic school readiness: 

A comparison among regions in the United States. Journal of Social Service Research, 39(3), 345-364. The 
study is ineligible for review because it does not examine an intervention implemented in a way that falls 
within the scope of the review. 

Zigler, E. F., & Styfco, S. J. (1996). Head Start and early childhood intervention: The changing course of social sci- 
ence and social policy. In E. F. Zigler, S. L. Kagan, & N. W. Hall (Eds.), Children, families, and government: Pre- 
paring for the twenty-first century (pp. 132-155). New York: Cambridge University Press. The study is ineligible 
for review because it is a secondary analysis of the effectiveness of an intervention, such as a meta-analysis 
or research literature review. 

Zigler, E. F., & Styfco, S. J. (2003). The federal commitment to preschool education: Lessons from and for Head 
Start. In A. J. Reynolds (Ed.), Early childhood programs fora new century (pp. 3-33). Washington, DC: Child 
Welfare League of America, Inc. The study is ineligible for review because it is a secondary analysis of the 
effectiveness of an intervention, such as a meta-analysis or research literature review. 

Zigler, E. F., & Styfco, S. J. (2004). The Head Start debates. Baltimore, MD: Brookes Publishing Company. The study 
is ineligible for review because it is a secondary analysis of the effectiveness of an intervention, such as a 
meta-analysis or research literature review. 
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Appendix A: Research details for DHHS ACF (2010) 

U.S. Department of Health and Human Services, Administration for Children and Families. (2010). Head 
Start impact study. Final report. Washington, DC. http://files.eric.ed.gov/fulltext/ED507845.pdf. 17 

Additional sources: 

Puma, M., Bell, S., Cook, R., Heid, C., Broene, P., Jenkins, F., & Downer, J. (2012). Third grade 
follow-up to the Head Start impact study final report. Washington, DC: U.S. Department of 
Health and Human Services Office of Planning, Research and Evaluation, http://files.eric. 
ed.gov/fulltext/ED539264.pdf. 

U.S. Department of Health and Human Services, Administration for Children and Families. (2005). 
Head Start impact study: First year findings. Washington, DC. http://files.eric.ed.gov/fulltext/ 
ED543015.pdf. 


Table A. Summary of findings Meets WWC group design standards without reservations 




Study findings 

Outcome domain 

Sample size 

Average improvement index 
(percentile points) 

Statistically significant 

General reading achievement 

3,697 children 

+13 

Yes 

Mathematics achievement 

1,617 children 

+3 

No 

Social-emotional development 

3,693 children 

+1 

No 


Setting The study was conducted using a nationally-representative sample of Head Start programs 
in the United States. 18 

Head Start programs in Puerto Rico were included in the original sample, but were analyzed 
separately and are not included in this report, since the assessments were provided only in 
Spanish, which is outside the scope of the review protocol for the Early Childhood Education 
topic area (version 3.0). 

Study sample The sample was created using a multistage process. All Head Start and delegate agencies 
in fiscal year 1998-99 were stratified by geography and demographic characteristics, and 
a random sample was then drawn from this list. 19 Head Start and delegate agencies were 
excluded from the sample pool if they were new grantees, were administered by American 
Indian/Alaska Native tribal organizations, had participated in the Head Start Family and Child 
Experiences Survey (FACES) 2000, 20 ran programs that were exclusively Early Head Start or 
Migrant and Seasonal Head Start, or had operated in communities in which most children 
participated in Head Start (this is because these programs were “saturated,” meaning there 
would be a limited chance of forming a comparison group, since so many children were 
already being served). 

Subsequently, eligible Head Start programs were randomly sampled from within delegate 
agencies. Similar to the criteria used above to exclude Head Start and delegate agencies, 
Head Start programs were excluded from the sample pool if they were saturated, had closed 
or merged with another program, were co-operated with a non -Head Start agency (e.g., a 
private preschool program), or were exclusively Early Head Start or Migrant and Seasonal 
Head Start programs. 
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Intervention 

group 


Comparison 

group 


Finally, children were randomly selected from the applicant pool of each program and 
then randomly assigned either to be offered Head Start or to be in the comparison group. 
Programs were allowed to exclude a limited number of children from the random assignment 
process if they were thought to be “high-risk” and in particular need of Head Start services. 
These children were not included in the impact analysis. 

The study design and presentation of findings focused on two cohorts of children: 

Three-year-old cohort: Children who were 3-years-old when applying to Head Start. The base- 
line sample included 383 Head Start programs, 1 ,466 children in the Head Start group, and 
988 children in the comparison group. The analytic sample (after attrition) included at most 
2,062 children (1 ,278 intervention and 784 comparison) for the general reading achievement 
and social-emotional development outcome domains. 

Four-year-old cohort: Children who were 4-years-old when applying to Head Start. The base- 
line sample included 383 Head Start programs, 1,192 children in the Head Start group, and 
815 children in the comparison group. The analytic sample (after attrition) included at most 
1,635 children (1,008 intervention and 627 comparison), depending on the outcome. 

Children in both cohorts were followed through the spring of third grade. Approximately 50% 
of the 3-year-old children who were originally assigned to the comparison group enrolled in 
Head Start as 4-year-olds. As a result, the desired contest was not maintained after the first 
year. Any impacts examined with the 3-year-old cohort after the first year of Head Start were 
determined to not be a test of the effectiveness of Head Start and are not included in this 
intervention report. 

Head Start includes diverse program models, and the study intervention group did as well. 

The intervention group included: center-based programs with home visits (the most common 
type), programs in which Head Start staff visited families at their homes, family child care 
programs, and programs that combined these models. 

Individual Head Start programs maintained their standard practices during the study. The 
programs in the Head Start group varied in terms of their quality, the specific types of services 
provided, and the numbers of months and hours the programs were available. In addition, 
children’s attendance levels varied. 

Parents of children in the comparison group were free to enroll their children in any program 
other than the Head Start programs in the study (or to not enroll them in any program). 
Consequently, children in the comparison group experienced diverse types of early care 
and education settings ranging from parent-only care to programs that were similar in type 
and services to Head Start. 

Authors reported that 17.3% of the 3-year-old baseline comparison group were enrolled in 
Head Start programs that were not part of the study in spite of their study group assignment. 21 
Authors reported that 13.9% of the 4-year-old baseline comparison group were enrolled in 
Head Start programs that were not part of the study in spite of their study group assignment. 22 
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Outcomes and 
measurement 


Support for 
implementation 


The following outcome measures were administered to both the 3-year-old and 4-year-old 
cohorts at the end of the intervention year for contrasts that meet WWC group design 
standards without reservations. In the general reading achievement domain, the authors 
used the PELS. In the social-emotional development domain, the authors used the Total 
Problem Behavior Scale, the Social Competencies Checklist, and the Social Skills and Positive 
Approaches to Learning Scale. In addition, in the mathematics achievement domain, the 
Counting Bears Test was used and meets WWC group design standards for only the 4-year- 
old cohort. For a more detailed description of these outcome measures, see Appendix B. 

Subtests for the 3- and 4-year old cohorts and outcomes after kindergarten, first grade, and 
third grade for the 4-year-old cohort are presented as supplemental findings. The supplemental 
findings do not factor into the intervention’s ratings of effectiveness. 

The study did not report information on the support or professional development offered in 
Head Start programs. 
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Appendix B: Outcome measures for each domain 


General reading achievement 

Early Childhood Longitudinal 
Study-Kindergarten (ECLS-K)* 

The ECLS-K is a standardized measure of children’s reading skills including comprehension, decoding, 
and vocabulary (as cited in DHHS ACF, 2010). 

Parent Emergent Literacy Scale (PELS) 

The PELS measures children’s skills in five areas using parent ratings. The areas include letter recognition, 
counting, name writing (real or pretend), and primary color identification. The instrument was developed for 
the FACES 2000 study (as cited in DHHS ACF, 2010). 

Mathematics achievement 

Counting Bears Test 

The Counting Bears Test measures children’s counting abilities and their understanding of one-to-one 
correspondence. A Spanish translation of this test was also used in the study. This task was adapted for the 
FACES study (U.S. Department of Health and Human Services, 2001) from the Comprehensive Assessment 
Program (CAP) Early Childhood Diagnostic Instrument (Mason & Stewart, 1989; as cited in DHHS ACF, 2010). 

Woodcock-Johnson III Tests of 
Achievement (WJ-III), Applied 
Problems Subtest* 3 

The WJ-III, Applied Problems Subtest is a standardized measure of children’s abilities to solve mathematical 
problems presented orally (Woodcock et at, 2001; as cited in DHHS ACF, 2010). 

WJ-III, Calculation Subtest* 3 

The WJ-III, Calculation Subtest is a standardized measure of children's knowledge of numbers and abilities 
to perform calculations (Woodcock et at, 2001; as cited in DHHS ACF, 2010). 

WJ-III, Math Reasoning Test* 3 

The WJ-III, Math Reasoning Test is a composite of two subtests (Applied Problems and Quantitative Concepts). 

It is a standardized measure of children's mathematical knowledge and reasoning (as cited in DHHS ACF, 2010). 

WJ-III, Quantitative Concepts Test* 3 

The WJ-III, Quantitative Concepts Test is a composite of two subtests (Concepts and Number Series). It is a 
standardized measure of children’s abilities to count, identify shapes, patterns, numbers, and series, as well 
as their knowledge of mathematical concepts and terms (Woodcock, McGrew, & Mather, 2001; as cited in 
DHHS ACF, 2010). 

Social-emotional development 

Adjustment Scales for Preschool 
Intervention (ASPI), Aggressive Behavior 
Dimension* 

The ASPI Aggressive Behavior Dimension measures 22 items associated with aggressive behaviors based on 
teacher report. The teacher is asked to choose behaviors (out of 144) that were demonstrated by the child 
during specific types of classroom situations over the previous 2 months. The score is the number of behaviors 
indicated in each dimension. It was based on the Adjustment Scales for Children and Adolescents (ASCA) (as 
cited in DHHS ACF, 2010). 

ASPI, Inattentive/Hyperactive 
Dimension* 

The ASPI Inattentive/Hyperactive Dimension measures the extent to which children demonstrate 10 behaviors 
associated with inattention, impulsivity, or hyperactivity (as cited in DHHS ACF, 2010). 

ASPI, Oppositional Dimension* 

The ASPI Oppositional Dimension measures the extent to which children demonstrate 11 behaviors associated 
with moodiness and controlling behaviors (as cited in DHHS ACF, 2010). 

ASPI, Problems with Peer 
Interaction Dimension* 

The ASPI Problems with Peer Interaction Dimension measures the extent to which children demonstrate 
24 problem behaviors over the course of six types of peer situations (as cited in DHHS ACF, 2010). 

ASPI, Problems with Structured 
Learning Dimension* 

The ASPI Problems with Structured Learning Dimension measures the extent to which children demonstrate 
40 problem behaviors over the course of seven types of structured classroom situations (as cited in DHHS ACF, 
2010). 

ASPI, Problems with Teacher 
Interaction Dimension* 

The ASPI Problems with Teacher Interaction Dimension measures the extent to which children demonstrate 30 
problem behaviors over the course of six types of classroom situations that include teachers (as cited in DHHS 
ACF, 2010). 

ASPI, Socially Reticent Dimension* 

The ASPI Socially Reticent Dimension measures the extent to which children demonstrate 12 behaviors 
associated with shyness and hesitancy (as cited in DHHS ACF, 2010). 

ASPI, Withdrawn-Low Energy Behavior 
Dimension* 

The ASPI Withdrawn-Low Energy Dimension measures the extent to which children demonstrate 18 behaviors 
associated with lack of energy and activity (as cited in DHHS ACF, 2010). 
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Social Competencies Checklist 

The Social Competencies Checklist measures children's social skills based on parents’ reports of the extent 
to which children exhibit 12 behaviors or characteristics (e.g., "takes care of personal belongings;” as cited in 
DHHSACF, 2010). 

Social Skills and Positive 
Approaches to Learning Scale 

The Social Skills and Positive Approaches to Learning Scale measures children’s social skills (e.g., cooperative 
and empathic behavior) and approaches to learning (e.g., curiosity, imagination, openness to challenges, and 
positive attitudes about gaining skills and knowledge). It is a researcher-developed composite of parents' ratings 
on seven items. The measure is based on one used in the FACES study (U.S. Department of Health and Human 
Services, 2001) that was in turn based on a modified Achenbach Classroom Behavior Checklist (Achenbach, 
Edelbrock, & Howell, 1987; as cited in DHHSACF, 2010). 

Total Problem Behavior Scale (TPBS) 

The Total Problem Behavior Scale measures the extent to which children exhibit problem behaviors. It is a 
composite created from parents' ratings that combines items in three subscales measuring: aggressive or 
defiant behavior; inattentive or hyperactive behavior; and shy, withdrawn, or depressed behavior. Parents are 
asked to judge whether the behavioral description offered in each of 1 4 items is “not true,” “sometimes true,” 
or “very true” of the child (as cited in DHHS ACF, 2010). 

TPBS , Aggressive Behavior Subscale* 

The Aggressive Behavior Subscale of the Total Problem Behavior Scale measures the extent to which children 
exhibit aggressive or defiant behavior. It is a composite created from parents’ ratings of children’s behavior on 
four items (as cited in DHHS ACF, 2010). 

TPBS, Hyperactive Behavior Subscale* 

The Hyperactive Behavior Subscale of the Total Problem Behavior Scale measures the extent to which children 
exhibit hyperactive or inattentive behavior. It is a composite created from parents’ ratings of children’s behavior 
on three items (as cited in DHHS ACF, 2010). 

TPBS, Withdrawn Behavior Subscale* 13 

The Withdrawn Behavior Subscale of the Total Problem Behavior Scale measures the extent to which children 
exhibit shy, withdrawn, or depressed behavior. It is a composite created from parents’ ratings of children's 
behavior on three items (as cited in DHHS ACF, 2010). 

Alphabetics 

Letter identification construct 

Letter Naming Task* 

The Letter Naming Task is a measure of children’s abilities to recognize letters of the alphabet 
(as cited in DHHS ACF, 2010). 

Phonological awareness construct 

Preschool Comprehensive Test of 
Phonological and Print Processing 
(Pre-CTOPPP), Elision Subtest* 

The Pre-CTOPPP Elision Subtest is a standardized measure of children's ability to identify and manipulate 
sounds in spoken words (Lonigan, Wagner, Torgesen, & Rashotte, 2002; as cited in DHHS ACF, 2010). 

Phonics construct 

WJ-III, Basic Reading Skills Tesf a 

The WJ-III, Basic Reading Skills Test is a composite of two subtests, Letter-Word Identification and Word Attack. 
It is a standardized measure of children’s abilities to use phonics, recognize words by sight, and use structural 
analysis (Woodcock et at, 2001; as cited in DHHS ACF, 2010). 

WJ-III, Letter- Word Identification 
Subtest* 8 

The WJ-III Letter-Word Identification Subtest is a standardized measure of children’s abilities to identify letters 
and words in English (Woodcock et al., 2001; as cited in DHHS ACF, 2010). 

WJ-III, Spelling Subtesf 8 

The WJ-III Spelling Subtest is a standardized measure of children’s abilities to write letters and words presented 
orally in English as well as pre-writing skills such as line-drawing and letter copying (Woodcock et al., 2001; 
as cited in DHHS ACF, 2010). 

WJ-III, Word Attack Subtesf 8 

The WJ-III Word Attack Subtest is a standardized measure of children’s abilities to produce the sounds 
associated with letters and to read real and nonsense words aloud using phonics and structural analysis 
(Woodcock et al., 2001; as cited in DHHSACF, 2010). 

Cognition 

WJ-III, Academic Applications * a 

The WJ-III, Academic Applications is a composite of three subtests: Passage Comprehension, Applied 
Problems, and Writing Samples. It is a standardized measure of children’s use of academic skills 
(as cited in DHHS ACF, 2010). 
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WJ-III, Academic Skills 1 3 

The WJ-III, Academic Skills is a composite of three subtests: Letter-Word Identification, Spelling, and 
Calculation. It is a standardized measure of children's basic academic skills in the areas of decoding, 
mathematical calculations, and spelling (as cited in DHHS ACF, 2010). a 

WJ-III, Pre-Academic Skills * 3 

The WJ-III, Pre-Academic Skills is a composite of three subtests: Letter-Word Identification, Spelling, and 
Applied Problems (as cited in DHHS ACF, 2010). 

Comprehension 

Reading comprehension construct 

WJ-III, Passage Comprehension 
Subtesf 3 

The WJ-III Passage Comprehension Subtest is a standardized measure of children’s abilities to provide missing 
words in passages using contextual information provided in the passages. The assessment begins by testing 
the ability to match symbols of objects with pictures of them and progresses through passages with increasing 
levels of difficulty in terms of length, vocabulary, and semantic complexity (Woodcock et al,, 2001; as cited in 
DHHS ACF, 2010). 

Vocabulary development construct 

Peabody Picture Vocabulary Test, Third 
Edition-Adapted (PPVT-Adapted)* 

The PPVT-Adapted is a standardized measure of receptive vocabulary in which children point to the pictures 
of named objects and actions that represents their meanings. The study employed a shortened version of this 
assessment, developed using maximum likelihood Item Response Theory (Dunn, Dunn, & Dunn, 1997; as cited 
in DHHS ACF, 2010). 

Language development 

WJ-III, Oral Comprehension 
Subtesf 3 

The WJ-III Oral Comprehension Subtest is a standardized measure of children’s abilities to comprehend a short 
orally-presented passage. In the assessment, children are asked to fill in missing words using syntactic and 
semantic information in the passage (as cited in DHHS ACF, 2010). 


* This outcome is considered a supplemental outcome because it was either a subtest or an outcome that was measured after the intervention year, so it does not represent an 
immediate impact of the intervention. For this reason, contrasts for these outcomes that meet WWC group design standards without reservations are presented in Appendix D, but do 
not contribute to the rating of effectiveness or the extent of evidence. 

"The rule for ending the administration of all WJ-III subscales was changed from the standard ceiling rule (stop the test after six incorrect answers have been given) to a lower ceiling 
rule (stop after three incorrect responses). The change was made so that the results from the current study would be comparable to those of the FACES study (U.S. Department of 
Health and Human Services, 2001 ; as cited in DHHS ACF, 201 0), which used the adapted rules to reduce the test burden on the young children being tested. Standard ceiling rules 
were used for the first-grade test administration. 

b The Withdrawn Behavior Scale was ineligible for review for the 4-year-old cohort after the Head Sfa/f year in 2003 because the reported measure of internal consistency was below 
the threshold set forth by the Early Childhood Education review protocol (version 3.0). 
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Appendix C.1: Findings included in the rating for the general reading achievement domain 


Mean 

(standard deviation) WWC calculations 


Outcome measure 

Study 

sample 

Sample 

size 

Intervention 

group 

Comparison 

group 

Mean 

difference 

Effect 

size 

Improvement 

index 

p-value 

DHHS ACF, 201 0 a 

Parent Emergent Literacy 
Scale (PELS) 

3-year-olds, 
Head Start 
year 

2,062 

children 

2.86 

(1.48) 

2.35 

(1.38) 

0.51 

0.35 

+14 

<.01 

PELS 

4-year-olds, 
Head Start 
year 

1,635 

children 

3.76 

(1.35) 

3.35 

(1.40) 

0.41 

0.30 

+12 

<.01 


Domain average for general reading achievement (DHSSACF, 2010) 0.33 +13 Statistically 

significant 


Domain average for general reading achievement across all studies 0.33 +13 na 


Table Notes: For mean difference, effect size, and improvement index values reported in the table, a positive number favors the intervention group and a negative number favors 
the comparison group. The effect size is a standardized measure of the effect of an intervention on outcomes, representing the average change expected for all individuals who 
are given the intervention (measured in standard deviations of the outcome measure). The improvement index is an alternate presentation of the effect size, reflecting the change 
in an average individual’s percentile rank that can be expected if the individual is given the intervention. The WWC-computed average effect size is a simple average rounded to 
two decimal places; the average improvement index is calculated from the average effect size. The statistical significance of each study's domain average was determined by the 
WWC. Some statistics may not sum as expected due to rounding, na = not applicable. 

a For DHHS ACF (201 0), no corrections for clustering or multiple comparisons and no difference-in-differences adjustments were needed. The p-values presented here were reported 
in the original study. This study is characterized as having a statistically significant positive effect because the effect for at least one measure within the domain is positive and 
statistically significant, and no effects are negative and statistically significant, accounting for multiple comparisons. For more information, please refer to the WWC Procedures and 
Standards Handbook (version 3.0), p. 26. 
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Appendix C.2: Findings included in the rating for the mathematics achievement domain 


Mean 

(standard deviation) WWC calculations 


Outcome measure 

Study 

sample 

Sample 

size 

Intervention 

group 

Comparison 

group 

Mean 

difference 

Effect 

size 

Improvement 

index 

p-value 

DHHS ACF, 201 0 a 

Counting Bears Test 

4-year-olds, 
Head Start 
year 

1,617 

children 

0.59 

(0.49) 

0.55 

(0.50) 

0.04 

0.08 

+3 

.19 


Domain average for mathematics achievement (DHHSACF, 2010) 0.08 +3 Not 

statistically 

significant 


Domain average for mathematics achievement across all studies 0.08 +3 na 


Table Notes: For mean difference, effect size, and improvement index values reported in the table, a positive number favors the intervention group and a negative number favors 
the comparison group. The effect size is a standardized measure of the effect of an intervention on outcomes, representing the average change expected for all individuals who are 
given the intervention (measured in standard deviations of the outcome measure). The improvement index is an alternate presentation of the effect size, reflecting the change in an 
average individual's percentile rank that can be expected if the individual is given the intervention. The statistical significance of each study’s domain average was determined by 
the WWC. Some statistics may not sum as expected due to rounding, na = not applicable. 

a For DHHS ACF (201 0), no corrections for clustering or multiple comparisons and no difference-in-differences adjustments were needed. The p-value presented here was reported 
in the original study. This study is characterized as having an indeterminate effect because the mean effect is neither statistically significant nor substantively important according to 
WWC criteria. For more information, please refer to the WWC Procedures and Standards Flandbook (version 3.0), p. 26. 
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Appendix C.3: Findings included in the rating for the social-emotional development domain 


Mean 

(standard deviation) WWC calculations 


Outcome measure 

Study 

sample 

Sample 

size 

Intervention 

group 

Comparison 

group 

Mean 

difference 

Effect 

size 

Improvement 

index 

p-value 

DHHS ACF, 201 0 a 









Total Problem Behavior Scale 
(negative) 

3-year-olds, 
Head Start 
year 

2,062 

children 

5.80 

(3.56) 

6.24 

(3.74) 

0.44 

0.12 

+5 

.05 

Social Competencies 
Checklist 

3-year-olds, 
Head Start 
year 

2,061 

children 

10.95 

(1.40) 

10.99 

(1.24) 

-0.04 

-0.03 

-1 

.54 

Social Skills and Positive 
Approaches to Learning Scale 

3-year-olds, 
Head Start 
year 

2,062 

children 

12.41 

(1.75) 

12.38 

(1.70) 

0.03 

0.02 

+1 

.74 

Total Problem Behavior Scale 
(negative) 

4-year-olds, 
Head Start 
year 

1,629 

children 

5.60 

(3.83) 

5.80 

(3.33) 

0.20 

0.05 

+2 

.41 

Social Competencies 
Checklist 

4-year-olds, 
Head Start 
year 

1,631 

children 

11.01 

(1.46) 

11.06 

(1.19) 

-0.05 

-0.04 

-1 

.67 

Social Skills and Positive 

4-year-olds, 

1,629 

12.46 

12.48 

-0.02 

-0.01 

0 

.89 


Approaches to Learning Scale HeadStart children (1.79) (1.64) 

year 

Domain average for social-emotional development (DHSSACF, 2010) 0.02 +1 Not 

statistically 

significant 


Domain average for social-emotional development across all studies 0.02 +1 na 


Table Notes: For mean difference, effect size, and improvement index values reported in the table, a positive number favors the intervention group and a negative number favors 
the comparison group. The outcome measures followed by “(negative)” imply that a lower score is considered a desirable outcome, so the signs of all WWC calculated statis- 
tics (mean difference, effect size, and improvement index) were adjusted to reflect this. The effect size is a standardized measure of the effect of an intervention on outcomes, 
representing the average change expected for all individuals who are given the intervention (measured in standard deviations of the outcome measure). The improvement index 
is an alternate presentation of the effect size, reflecting the change in an average individual's percentile rank that can be expected if the individual is given the intervention. The 
WWC-computed average effect size is a simple average rounded to two decimal places; the average improvement index is calculated from the average effect size. The statistical 
significance of each study’s domain average was determined by the WWC. Some statistics may not sum as expected due to rounding, na = not applicable. 

a For DHHS ACF (201 0), a correction for multiple comparisons was needed but did not affect whether any of the contrasts were found to be statistically significant. The p-values 
presented here were reported in the original study. This study is characterized as having an indeterminate effect because the mean effect is neither statistically significant nor sub- 
stantively important, accounting for multiple comparisons. For more information, please refer to the WWC Procedures and Standards Handbook (version 3.0), p. 26. 
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Appendix D.1: Description of supplemental findings for the general reading achievement domain 


Mean 

(standard deviation) WWC calculations 


Outcome measure 

Study 

sample 

Sample 

size 

Intervention 

group 

Comparison 

group 

Mean 

difference 

Effect 

size 

Improvement 

index 

p-value 

Puma et al., 2012 a 

Early Childhood Longitudinal 
Study-Kindergarten (ECLS-K) 

4-year-old 
cohort, 
Grade 3 

1,414 

children 

98.61 

(19.63) 

96.63 

(20.24) 

1.98 

0.10 

+4 

.14 


Table Notes: The supplemental findings presented in this table are additional findings from studies in this report that do not factor into the determination of the intervention rating. 
For mean difference, effect size, and improvement index values reported in the table, a positive number favors the intervention group and a negative number favors the compari- 
son group. The effect size is a standardized measure of the effect of an intervention on outcomes, representing the average change expected for all individuals who are given the 
intervention (measured in standard deviations of the outcome measure). The improvement index is an alternate presentation of the effect size, reflecting the change in an average 
individual's percentile rank that can be expected if the individual is given the intervention. Some statistics may not sum as expected due to rounding. 

a For Puma et al. (201 2), no corrections for clustering or multiple comparisons and no difference-in-differences adjustments were needed. The p-value presented here was reported in 
the original study. 
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Appendix D.2: Description of supplemental findings for the mathematics achievement domain 





Mean 

(standard deviation) 

WWC calculations 


Outcome measure 

Study 

sample 

Sample 

size 

Intervention 

group 

Comparison 

group 

Mean 

difference 

Effect 

size 

Improvement 

index 

p-value 

DHHS ACF, 201 0 a 

Woodcock-Johnson III Tests 
of Achievement (WJ-III), 
Applied Problems Subtest 

4-year-old 

cohort, 

Kindergarten 

1,533 

children 

426.59 

(19.55) 

426.32 

(21.85) 

0.27 

0.01 

0 

.87 

WJ-III, Math 
Reasoning Test 

4-year-old 

cohort, 

Kindergarten 

1,533 

children 

434.15 

(16.53) 

434.12 

(17.68) 

0.03 

0.00 

0 

.98 

WJ-III, Quantitative 
Concepts Test 

4-year-old 

cohort, 

Kindergarten 

1,530 

children 

441.83 

(17.77) 

441.88 

(17.16) 

-0.05 

0.00 

0 

.97 

WJ-III, Applied Problems 
Subtest 

4-year-old 
cohort, Grade 1 

1,526 

children 

455.16 

(19.30) 

454.13 

(19.82) 

1.03 

0.05 

+2 

.41 

WJ-III, Calculation Subtest 

4-year-old 
cohort, Grade 1 

1,519 

children 

461.76 

(18.29) 

460.46 

(19.40) 

1.30 

0.07 

+3 

.25 

WJ-III, Math Reasoning 
Test 

4-year-old 
cohort, Grade 1 

1,526 

children 

458.36 

(17.18) 

457.70 

(17.48) 

0.66 

0.04 

+2 

.58 

WJ-III, Quantitative 
Concepts Test 

4-year-old 
cohort, Grade 1 

1,524 

children 

461.79 

(17.49) 

461.28 

(17.99) 

0.51 

0.03 

+1 

.71 

Puma et al., 2012 b (4-year-old cohort) 

WJ-III, Applied Problems 
Subtest 

4-year-old 
cohort, Grade 3 

1,422 

children 

486.96 

(20.37) 

487.70 

(19.40) 

-0.74 

-0.04 

-1 

.60 

WJ-III, Calculation Subtest 

4-year-olds, 
Grade 3 

1,422 

children 

491.28 

(15.75) 

491.52 

(16.35) 

-0.24 

-0.02 

-1 

.83 


Table Notes: The supplemental findings presented in this table are additional findings from studies in this report that do not factor into the determination of the intervention rating. 
For mean difference, effect size, and improvement index values reported in the table, a positive number favors the intervention group and a negative number favors the compari- 
son group. The effect size is a standardized measure of the effect of an intervention on outcomes, representing the average change expected for all individuals who are given the 
intervention (measured in standard deviations of the outcome measure). The improvement index is an alternate presentation of the effect size, reflecting the change in an average 
individual's percentile rank that can be expected if the individual is given the intervention. Some statistics may not sum as expected due to rounding. 

a For DHHS ACF (201 0, 4-year-old cohort), a correction for multiple comparisons was needed but did not affect whether any of the contrasts were found to be statistically significant. 
The p-values presented here were reported in the original study. 

b For Puma et al. (201 2), a correction for multiple comparisons was needed but did not affect whether any of the contrasts were found to be statistically significant. The p-values 
presented here were reported in the original study. 
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Appendix D.3: Description of supplemental findings for the social-emotional development domain 


Mean 

(standard deviation) WWC calculations 



Study 

Sample 

Intervention 

Comparison 

Mean 

Effect 

Improvement 


Outcome measure 

sample 

size 

group 

group 

difference 

size 

index 

p-value 

DHHS ACF, 201 0 a 

Total Problem Behavior 

3-year-old cohort, 

2,062 

2.97 

3.05 

0.08 

0.05 

+2 

.42 

Scale (TPBS), Aggressive 
Behavior Subscale 
(negative) 

Head Start year 

children 

(1.71) 

(1.73) 





TPBS, Hyperactive Behavior 

3-year-old cohort, 

2,062 

1.71 

2.00 

0.29 

0.19 

+7 

<.01 

Subscale (negative) 

Head Start year 

children 

(1.51) 

(1.58) 





TPBS, Withdrawn Behavior 

3-year-old cohort, 

2,060 

0.55 

0.58 

0.03 

0.03 

+1 

.71 

Subscale (negative) 

Head Start year 

children 

(0.90) 

(1.00) 





TPBS, Aggressive Behavior 

4-year-old cohort, 

1,630 

2.73 

2.86 

0.13 

0.08 

+3 

.26 

Subscale (negative) 

Head Start year 

children 

(1.77) 

(1.58) 





TPBS, Hyperactive Behavior 

4-year-old cohort, 

1,630 

1.71 

1.77 

0.06 

0.04 

+2 

.50 

Subscale (negative) 

Head Start year 

children 

(1.51) 

(1.44) 





Social Competencies 

4-year-old cohort, 

1,556 

11.10 

11.17 

-0.07 

-0.06 

-2 

.38 

Checklist 

Kindergarten 

children 

(1.33) 

(1.05) 





Social Skills and Positive 

4-year-old cohort, 

1,556 

12.66 

12.63 

0.03 

0.02 

+1 

.78 

Approaches to Learning 
Scale 

Kindergarten 

children 

(1.63) 

(1.53) 





TPBS (negative) 

4-year-old cohort, 

1,555 

5.18 

4.99 

-0.19 

-0.05 

-2 

.46 


Kindergarten 

children 

(3.88) 

(3.29) 





TPBS, Aggressive Behavior 

4-year-old cohort, 

1,556 

2.41 

2.47 

0.06 

0.03 

+1 

.61 

Subscale (negative) 

Kindergarten 

children 

(1.82) 

(1.56) 





TPBS, Hyperactive Behavior 

4-year-old cohort, 

1,556 

1.53 

1.39 

-0.14 

-0.09 

-4 

.17 

Subscale (negative) 

Kindergarten 

children 

(1.52) 

(1.46) 





Adjustment Scales for 

4-year-old cohort, 

1,166 

48.56 

49.12 

0.56 

0.07 

+3 

.38 

Preschool Intervention 
(ASPI), Aggressive Behavior 
Dimension (negative) 

Grade 1 

children 

(7.42) 

(7.88) 





ASPI, Inattentive / 

4-year-old cohort, 

1,179 

50.35 

50.50 

0.15 

0.02 

+1 

.85 

Hyperactive Dimension 
(negative) 

Grade 1 

children 

(8.47) 

(8.22) 





ASPI, Oppositional 

4-year-old cohort, 

1,176 

47.79 

47.88 

0.09 

0.01 

0 

.91 

Dimension (negative) 

Grade 1 

children 

(7.49) 

(7.33) 





ASPI, Problems with Peer 

4-year-old cohort, 

1,226 

51.33 

51.53 

0.20 

0.02 

+1 

.80 

Interaction Dimension 
(negative) 

Grade 1 

children 

(10.99) 

(11.42) 





ASPI, Problems with 

4-year-old cohort, 

1,226 

51.03 

50.29 

-0.74 

-0.07 

-3 

.31 

Structured Learning 
Dimension (negative) 

Grade 1 

children 

(10.78) 

(10.68) 





ASPI, Problems with 

4-year-old cohort, 

1,226 

50.14 

48.81 

-1.33 

-0.13 

-5 

.11 

Teacher Interaction 

Grade 1 

children 

(10.38) 

(10.14) 






Dimension (negative) 
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Mean 








(standard deviation) 

WWC calculations 



Study 

Sample 

Intervention 

Comparison 

Mean 

Effect 

Improvement 


Outcome measure 

sample 

size 

group 

group 

difference 

size 

index 

p-value 

ASPI, Socially Reticent 

4-year-old cohort, 

1,182 

48.00 

46.76 

-1.24 

-0.16 

-6 

.04 

Dimension (negative) 

Grade 1 

children 

(7.74) 

(7.35) 





ASPI , Withdrawn/ 

4-year-old cohort, 

1,177 

49.87 

49.22 

-0.65 

-0.09 

-4 

.26 

Low Energy Behavior 
Dimension (negative) 

Grade 1 

children 

(7.59) 

(6.95) 





Social Competencies 

4-year-old cohort, 

1,575 

11.09 

11.13 

-0.04 

-0.03 

-1 

.53 

Checklist 

Grade 1 

children 

(1.39) 

(1.17) 





Social Skills and Positive 

4-year-old cohort, 

1,576 

11.09 

11.13 

-0.04 

-0.03 

-1 

.93 

Approaches 
to Learning Scale 

Grade 1 

children 

(1.39) 

(1.17) 





TPBS (negative) 

4-year-old cohort, 

1,577 

4.84 

5.05 

0.21 

0.06 

+2 

.45 


Grade 1 

children 

(3.83) 

(3.79) 





TPBS, Aggressive 

4-year-old cohort, 

1,577 

2.20 

2.29 

0.09 

0.05 

+2 

.48 

Behavior Subscale 
(negative) 

Grade 1 

children 

(1.82) 

(1.75) 





TPBS, Hyperactive 

4-year-old cohort, 

1,577 

1.43 

1.46 

0.03 

0.02 

+1 

.78 

Behavior Subscale 
(negative) 

Grade 1 

children 

(1.53) 

(1.54) 





TPBS, Withdrawn Behavior 

4-year-old cohort, 

1,576 

0.71 

0.83 

0.12 

0.12 

+5 

.08 

Subscale (negative) 

Grade 1 

children 

(1.01) 

(1.04) 





Puma et al., 2012 b 

Social Competencies 

4-year-old cohort, 

1,156 

0.02 

0.12 

-0.10 

-0.10 

-4 

.19 

Checklist 

Grade 3 

children 

(1.02) 

(1.00) 





Social Skills and Positive 

4-year-old cohort, 

1,508 

11.95 

12.11 

-0.16 

-0.08 

-3 

.21 

Approaches to Learning 
Scale 

Grade 3 

children 

(1.98) 

(1.91) 





TPBS (negative) 

4-year-old cohort, 

1,508 

5.70 

6.18 

0.48 

0.12 

+5 

.14 


Grade 3 

children 

(4.15) 

(4.19) 





TPBS, Aggressive 

4-year-old cohort, 

1,508 

2.24 

2.47 

0.23 

0.13 

+5 

.07 

Behavior Subscale 
(negative) 

Grade 3 

children 

(1.79) 

(1.81) 





TPBS, Hyperactive 

4-year-old cohort, 

1,508 

1.91 

1.99 

0.08 

0.05 

+2 

.52 

Behavior Subscale 
(negative) 

Grade 3 

children 

(1.65) 

(1.65) 





TPBS, Withdrawn Behavior 

4-year-old cohort, 

1,507 

1.02 

1.13 

0.11 

0.09 

+4 

.16 

Subscale (negative) 

Grade 3 

children 

(1.29) 

(1.17) 






Table Notes: The supplemental findings presented in this table are additional findings from studies in this report that do not factor into the determination of the intervention rating. For 
mean difference, effect size, and improvement index values reported in the table, a positive number favors the intervention group and a negative number favors the comparison group. 
The outcome measures followed by “(negative)” imply that a lower score is considered a desirable outcome, so the signs of all WWC calculated statistics (mean difference, effect size, 
and improvement index) were adjusted to reflect this. The effect size is a standardized measure of the effect of an intervention on outcomes, representing the average change expected 
for all individuals who are given the intervention (measured in standard deviations of the outcome measure). The improvement index is an alternate presentation of the effect size, re- 
flecting the change in an average individual’s percentile rank that can be expected if the individual is given the intervention. Some statistics may not sum as expected due to rounding. 

a For DHHS ACF (201 0), a correction for multiple comparisons was needed, and this influenced the statistical significance of one outcome: ASPI-Socially Reticent Dimension after first 
grade. This outcome is no longer significant after the correction is made. The correction did not affect whether any of the other contrasts were found to be statistically significant. The 
p-values presented here were reported in the original study. 

b For Puma et al. (201 2), a correction for multiple comparisons was needed but did not affect whether any of the contrasts were found to be statistically significant. The p-values 
presented here were reported in the original study. 
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Appendix D.4: Description of supplemental findings for the alphabetics domain 


Mean 

(standard deviation) WWC calculations 

Study Sample Intervention Comparison Mean Effect Improvement 

Outcome measure sample size group group difference size index p-value 


DHHS ACF, 201 0 a 


Letter Naming Task 

4-year-old cohort, 
Kindergarten 

1,533 

children 

22.99 

(6.10) 

22.65 

(6.59) 

0.34 

0.05 

+2 

.35 

Preschool 

Comprehensive Test 
of Phonological and 
Print Processing (Pre- 
CTOPPP), Elision Subtest 

4-year-old cohort, 
Kindergarten 

1,534 

children 

321.89 

(49.63) 

323.91 

(47.07) 

-2.02 

-0.04 

-2 

.59 

Woodcock-Johnson III 
Test of Achievement 
(WJ-III), Basic Reading 
Skills Test 

4-year-old cohort, 
Kindergarten 

1,530 

children 

404.79 

(31.24) 

405.39 

(32.50) 

-0.60 

-0.02 

-1 

.77 

WJ-III, Letter-Word 
Identification Subtest 

4-year-old cohort, 
Kindergarten 

1,534 

children 

378.08 

(31.61) 

378.15 

(33.53) 

-0.07 

0.00 

0 

.97 

WJ-III, Spelling Subtest 

4-year-old cohort, 
Kindergarten 

1,535 

children 

413.91 

(28.61) 

414,12 

(29.23) 

-0.21 

-0.01 

0 

.90 

WJ-III, Word Attack 
Subtest 

4-year-old cohort, 
Kindergarten 

1,530 

children 

431.60 

(34.35) 

432.68 

(34.52) 

-1.08 

-0.03 

-1 

.63 

WJ-III, Basic Reading 
Skills Test 

4-year-old cohort, 
Grade 1 

1,523 

children 

451.04 

(32.36) 

449.81 

(33.13) 

1.23 

0.04 

+2 

.52 

WJ-III, Letter-Word 
Identification Subtest 

4-year-old cohort, 
Grade 1 

1,525 

children 

433.01 

(36.22) 

432.26 

(36.54) 

0.75 

0.02 

+1 

.73 

WJ-III, Spelling Subtest 

4-year-old cohort, 
Grade 1 

1,527 

children 

473.42 

(17.92) 

472.36 

(16.99) 

1.06 

0.06 

+2 

.44 

WJ-III, Word Attack 
Subtest 

4-year-old cohort, 
Grade 1 

1,524 

children 

469.10 

(31.13) 

467.41 

(32.76) 

1.69 

0.05 

+2 

.34 

Puma et al., 2012 b 

WJ-III, Letter-Word 
Identification Subtest 

4-year-old cohort, 
Grade 3 

1,422 

children 

482.10 

(29.48) 

480.60 

(28.72) 

1.50 

0.05 

+2 

.45 


Table Notes: The supplemental findings presented in this table are additional findings from studies in this report that do not factor into the determination of the intervention rating. 
For mean difference, effect size, and improvement index values reported in the table, a positive number favors the intervention group and a negative number favors the compari- 
son group. The effect size is a standardized measure of the effect of an intervention on outcomes, representing the average change expected for all individuals who are given the 
intervention (measured in standard deviations of the outcome measure). The improvement index is an alternate presentation of the effect size, reflecting the change in an average 
individual's percentile rank that can be expected if the individual is given the intervention. Some statistics may not sum as expected due to rounding. 
a For DHHS ACF (201 0), a correction for multiple comparisons was needed but did not affect whether any of the contrasts were found to be statistically significant. The p-values 
presented here were reported in the original study. 

b For Puma et al. (201 2), no corrections for clustering or multiple comparisons and no difference-in-differences adjustments were needed. The p-value presented here was reported in 
the original study. 
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Appendix D.5: Description of supplemental findings for the cognition domain 


Mean 

(standard deviation) WWC calculations 


Outcome measure 

Study 

sample 

Sample 

size 

Intervention 

group 

Comparison 

group 

Mean 

difference 

Effect 

size 

Improvement 

index 

p-value 

DHHS ACF, 201 0 a 

Woodcock-Johnson III 
Tests of Achievement 
(WJ-III), Pre-Academic 
Skills 

4-year-old cohort, 
Kindergarten 

1,533 

children 

406.23 

(22.61) 

406.48 

(24.25) 

-0.25 

-0.01 

0 

.87 

WJ-III, Academic 
Applications 

4-year-old cohort, 
Grade 1 

1,524 

children 

461.77 

(16.92) 

461.22 

(16.70) 

0.55 

0.03 

+1 

.61 

WJ-III, Academic Skills 

4-year-old cohort, 
Grade 1 

1,514 

children 

449.02 

(23.70) 

447.71 

(24.70) 

1.31 

0.05 

+2 

.38 

WJ-III, Pre-Academic 
Skills 

4-year-old cohort, 
Grade 1 

1,525 

children 

446.66 

(24.32) 

445.44 

(24.99) 

1.22 

0.05 

+2 

.41 


Table Notes: The supplemental findings presented in this table are additional findings from studies in this report that do not factor into the determination of the intervention rating. 
For mean difference, effect size, and improvement index values reported in the table, a positive number favors the intervention group and a negative number favors the compari- 
son group. The effect size is a standardized measure of the effect of an intervention on outcomes, representing the average change expected for all individuals who are given the 
intervention (measured in standard deviations of the outcome measure). The improvement index is an alternate presentation of the effect size, reflecting the change in an average 
individual's percentile rank that can be expected if the individual is given the intervention. Some statistics may not sum as expected due to rounding. 
a For DHHS ACF (201 0), a correction for multiple comparisons was needed but did not affect whether any of the contrasts were found to be statistically significant. The p-values 
presented here were reported in the original study. 
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Appendix D.6: Description of supplemental findings for the comprehension domain 





Mean 

(standard deviation) 

WWC calculations 


Outcome measure 

Study 

sample 

Sample 

size 

Intervention 

group 

Comparison 

group 

Mean 

difference 

Effect 

size 

Improvement 

index 

p-value 

DHHS ACF, 201 0 a 

Peabody Picture 
Vocabulary Test, 

Third Edition-Adapted 
(PPVT-Adapted) 

4-year-old cohort, 
Kindergarten 

1,535 

children 

334.21 

(39.08) 

331.85 

(41.20) 

2.36 

0.06 

+2 

.40 

PPVT-Adapted 

4-year-old cohort, 
Grade 1 

1,527 

children 

363.07 

(32.18) 

358.74 

(32.18) 

4.33 

0.13 

+5 

.08 

Woodcock-Johnson III 
Tests of Achievement 
(WJ-III), Passage 
Comprehension 

4-year-old cohort, 
Grade 1 

1,524 

children 

450.28 

(24.96) 

449.86 

(23.85) 

0.42 

0.02 

+1 

.81 

Puma et al., 2012 b 

PPVT-Adapted 

4-year-old cohort, 
Grade 3 

1,422 

children 

408.14 

(29.83) 

405.74 

(28.65) 

2.40 

0.08 

+3 

.30 


Table Notes: The supplemental findings presented in this table are additional findings from studies in this report that do not factor into the determination of the intervention rating. 
For mean difference, effect size, and improvement index values reported in the table, a positive number favors the intervention group and a negative number favors the 
comparison group. The effect size is a standardized measure of the effect of an intervention on outcomes, representing the average change expected for all individuals who are 
given the intervention (measured in standard deviations of the outcome measure). The improvement index is an alternate presentation of the effect size, reflecting the change in an 
average individual's percentile rank that can be expected if the individual is given the intervention. Some statistics may not sum as expected due to rounding. 

a For DHHS ACF (201 0), a correction for multiple comparisons was needed but did not affect whether any of the contrasts were found to be statistically significant. The p-values 
presented here were reported in the original study. 

b For Puma et al. (201 2), no corrections for clustering or multiple comparisons and no difference-in-differences adjustment were needed. The p-value presented here was reported in 
the original study. 


HeadStart July 201 5 


Page 30 



WWC Intervention Report 


Appendix D.7: Description of supplemental findings for the language development domain 


Mean 

(standard deviation) WWC calculations 


Outcome measure 

Study 

sample 

Sample 

size 

Intervention 

group 

Comparison 

group 

Mean 

difference 

Effect 

size 

Improvement 

index 

p-value 

DHHS ACF, 201 0 a 

Woodcock-Johnson III 
Tests of Achievement 
(WJ-III), Oral 
Comprehension Subtest 

4-year-old cohort, 
Kindergarten 

1,533 

children 

456.52 

(19.18) 

457.29 

(17.74) 

-0.77 

-0.04 

-2 

.55 

WJ-III, Oral 

Comprehension Subtest 

4-year-old cohort, 
Grade 1 

1,525 

children 

473.42 

(17.92) 

472.36 

(16.99) 

1.06 

0.06 

+2 

.44 


Table Notes: The supplemental findings presented in this table are additional findings from studies in this report that do not factor into the determination of the intervention rating. 
For mean difference, effect size, and improvement index values reported in the table, a positive number favors the intervention group and a negative number favors the compari- 
son group. The effect size is a standardized measure of the effect of an intervention on outcomes, representing the average change expected for all individuals who are given the 
intervention (measured in standard deviations of the outcome measure). The improvement index is an alternate presentation of the effect size, reflecting the change in an average 
individual's percentile rank that can be expected if the individual is given the intervention. Some statistics may not sum as expected due to rounding. 
a For DHHS ACF (201 0), a correction for multiple comparisons was needed but did not affect whether any of the contrasts were found to be statistically significant. The p-values 
presented here were reported in the original study. 
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Endnotes 

1 The descriptive information for this program was obtained from a publicly available source: the Head Start website (http://www.acf. 
hhs.gov/programs/ohs, downloaded June 2014). The WWC requests that program offices review the program description sections 
for accuracy from their perspective. The program description was provided to the program office in March 2014, and the WWC 
incorporated feedback from the office. Further verification of the accuracy of the descriptive information for this program is beyond 
the scope of this review. 

2 At least 90% of enrolled children must be from families with incomes below the federal poverty level, receiving public assistance, 
or are homeless. Children may also be enrolled if they are in foster care. In addition, Head Start programs must reserve 1 0% of 
enrollment slots for children with diagnosed disabilities. 

3 This intervention report focuses on the effectiveness of the three most common models of Head Start: center-based programs 
with home visits, family child care programs, and home visits, and any combination of these three models. There are other types of 
Head Start programs, including Early Head Start , Migrant and Seasonal Head Start, and Family and Community Partnerships Head 
Start ; these alternatives were not reviewed for this intervention report. This intervention report does not include research assessing 
the effectiveness of particular curricula used in Head Start ; rather, this intervention report focuses on the effectiveness of attending 
a Head Start program. 

4 The literature search reflects documents publicly available by July 2014. The studies in this report were reviewed using the 
Standards from the WWC Procedures and Standards Handbook (version 3.0), along with those described in the Early Childhood 
Education review protocol (version 3.0). The evidence presented in this report is based on available research. Findings and conclu- 
sions may change as new research becomes available. A quick review blast of DHHS ACF (2010) was released and revised in 2010, 
which rated the study as meets WWC group design standards with reservations for both cohorts due to high attrition. Based on 
analytic sample sizes from the series of reports, there is an analysis which has low attrition, resulting in a current study rating of meets 
WWC group design standards without reservations. The quick review blast included ratings for follow-up comparisons; this interven- 
tion report presents follow-up findings as supplemental findings that do not factor into the intervention’s rating of effectiveness. Addi- 
tionally, all follow-up comparisons for the 3-year-old cohort do not meet WWC group design standards and are not presented in the 
supplemental findings. For these analyses, approximately 50% of the children originally assigned to the comparison group enrolled in 
Head Start as 4-year-olds. Finally, though the quick review blast focused only on academic outcomes, this intervention report includes 
outcomes in additional domains. In the 2010 quick review blast, DHHS ACF (2010) was cited as Puma et al. (2010). 

5 Absence of conflict of interest: This intervention report included studies conducted by staff from Abt Associates Inc. Because Abt 
Associates Inc. is one of the contractors that administers the WWC, those studies were rated by staff members from a different 
organization. The report was reviewed by the lead methodologist, a WWC Quality Assurance reviewer, and an external peer reviewer. 

6 For criteria used in the determination of the rating of effectiveness and extent of evidence, see the WWC Rating Criteria on p. 34. 
These improvement index numbers show the average and range of individual-level improvement indices for all findings across the 
studies. 

7 In this report, “grantees” refer to organizations that administered and had financial responsibility for programs, and “delegates” refer 
to organizations that were subcontracted to grantees to administer programs. 

8 The number of grantees and Head Start centers is based on both the 3-year-old and 4-year old cohorts. The total number of grant- 
ees and centers indicated in the report may not reflect the total number of grantees and centers that actually participated for either 
cohort in each study. The sample of Puerto Rican students was analyzed separately in each study; these analyses were not reviewed 
for inclusion in this intervention report, as the assessments were provided only in Spanish, which is outside the scope of review for the 
Early Childhood Education topic area (version 3.0). 

9 These numbers do not include children in Puerto Rico (see Endnote 8). 

10 This percentage includes Head Start programs in Puerto Rico. 

11 These numbers do not include children in Puerto Rico (see Endnote 8). 

12 This percentage includes Head Start programs in Puerto Rico. 

13 The review protocol stipulates children must be between 3 and 5 years of age at the time of the intervention and prioritizes the 
immediate posttest for determination of effectiveness. Outcomes collected when students are older— for example, in elementary or 
middle school — are eligible for review and included as supplemental findings in this intervention report. Note that additional outcomes 
were measured for the cohort of 3-year-old children in 2004, 2005, 2006, and 2008 to reflect additional follow-up outcomes at pre- 
school, kindergarten, first grade, and third grade, respectively. However, approximately 50% of the 3-year-old children who were 
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originally assigned to the comparison group eventually enrolled in Head Start as 4-year-olds. The WWC has determined that follow- 
up contrasts for the 3-year-old cohort collected after more than 1 year of Head Start are not included in this review, as the research 
design did not maintain the desired contrast after the first year. 

14 According the WWC Procedures and Standards Handbook (version 3.0), only composite test measures can contribute to the rating 
for the intervention when both composite test measures and their components are reported (Handbook, p. 17). Contrasts of subtest 
outcomes that meet WWC group design standards with or without reservations are still included in this intervention report as 
supplemental findings. 

15 This is a composite measure of three subscales. Contrasts using this measure were also reported in the study and meet WWC 
group design standards without reservations: (a) Aggressive Behavior Scale, (b) Hyperactive Behavior Scale, and (c) Withdrawn 
Behavior Scale. 

16 The procedure for classifying an effect based on multiple univariate outcomes within a single domain can be found in the WWC 
Procedures and Standards Handbook (version 3.0), Table IV.2 (p. 26). 

17 The WWC identified one additional source related to DHHS ACF (2010). The study does not contribute unique information to 
Appendix A.1 and is not listed here. 

18 Head Start programs in Puerto Rico were included in the original sample, but were analyzed separately and not included in 
this report. See Endnote 8 for additional details. 

19 Agencies were stratified by: geographic proximity, program percentage of Hispanic and African-American children; region; location 
(e.g., urban, rural, etc.); program auspice (i.e., whether programs were based in schools); whether programs were part-day only, 
full-day only, or both; and the percentage of a program’s enrollment comprised of entering 3-year-old children. 

20 For more information on the FACES study, see http://www.acf.hhs.gov/programs/opre/research/project/head-start-family-and-child- 
experiences-survey-faces. 

21 This percentage includes Head Start programs in Puerto Rico. 

22 This percentage includes Head Start programs in Puerto Rico. 

Recommended Citation 

U.S. Department of Education, Institute of Education Sciences, What Works Clearinghouse. (2015, July). 

Early Childhood Education intervention report: Head Start. Retrieved from http://whatworks.ed.gov 
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WWC Rating Criteria 

Criteria used to determine the rating of a study 

Study rating 

Criteria 

Meets WWC group design 
standards without reservations 

A study that provides strong evidence for an intervention’s effectiveness, such as a well-implemented RCT. 

Meets WWC group design 

A study that provides weaker evidence for an intervention's effectiveness, such as a QED or an RCT with high 

standards with reservations 

attrition that has established equivalence of the analytic samples. 

Criteria used to determine the rating of effectiveness for an intervention 

Rating of effectiveness 

Criteria 

Positive effects 

Two or more studies show statistically significant positive effects, at least one of which met WWC group design 
standards for a strong design, AND 

No studies show statistically significant or substantively important negative effects. 

Potentially positive effects 

At least one study shows a statistically significant or substantively important positive effect, AND 

No studies show a statistically significant or substantively important negative effect AND fewer or the same number 

of studies show indeterminate effects than show statistically significant or substantively important positive effects. 

Mixed effects 

At least one study shows a statistically significant or substantively important positive effect AND at least one study 
shows a statistically significant or substantively important negative effect, but no more such studies than the number 
showing a statistically significant or substantively important positive effect, OR 

At least one study shows a statistically significant or substantively important effect AND more studies show an 
indeterminate effect than show a statistically significant or substantively important effect. 

Potentially negative effects 

One study shows a statistically significant or substantively important negative effect and no studies show a 
statistically significant or substantively important positive effect, OR 

Two or more studies show statistically significant or substantively important negative effects, at least one study 
shows a statistically significant or substantively important positive effect, and more studies show statistically 
significant or substantively important negative effects than show statistically significant or substantively important 
positive effects. 

Negative effects 

Two or more studies show statistically significant negative effects, at least one of which met WWC group design 
standards for a strong design, AND 

No studies show statistically significant or substantively important positive effects. 

No discernible effects 

None of the studies shows a statistically significant or substantively important effect, either positive or negative. 

Criteria used to determine the extent of evidence for an intervention 

Extent of evidence 

Criteria 

Medium to large 

The domain includes more than one study, AND 
The domain includes more than one school, AND 

The domain findings are based on a total sample size of at least 350 students, OR, assuming 25 students in a class, 
a total of at least 14 classrooms across studies. 

Small 

The domain includes only one study, OR 
The domain includes only one school, OR 

The domain findings are based on a total sample size of fewer than 350 students, AND, assuming 25 students in a 
class, a total of fewer than 14 classrooms across studies. 
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Glossary of Terms 

Attrition 

Clustering adjustment 
Confounding factor 

Design 
Domain 
Effect size 

Eligibility 

Equivalence 

Extent of evidence 

Improvement index 

Intervention 
Intervention report 


Multiple comparison 
adjustment 

Quasi-experimental 
design (QED) 

Randomized controlled 
trial (RCT) 

Rating of effectiveness 


Single-case design 


Attrition occurs when an outcome variable is not available for all participants initially assigned 
to the intervention and comparison groups. The WWC considers the total attrition rate and 
the difference in attrition rates across groups within a study. 

If intervention assignment is made at a cluster level and the analysis is conducted at the student 
level, the WWC will adjust the statistical significance to account for this mismatch, if necessary. 

A confounding factor is a component of a study that is completely aligned with one of the 
study conditions, making it impossible to separate how much of the observed effect was 
due to the intervention and how much was due to the factor. 

The design of a study is the method by which intervention and comparison groups were assigned. 
A domain is a group of closely related outcomes. 

The effect size is a measure of the magnitude of an effect. The WWC uses a standardized 
measure to facilitate comparisons across studies and outcomes. 

A study is eligible for review and inclusion in this report if it falls within the scope of the 
review protocol and uses either an experimental or matched comparison group design. 

A demonstration that the analytic sample groups are similar on observed characteristics 
defined in the review area protocol. 

An indication of how much evidence supports the findings. The criteria for the extent 
of evidence levels are given in the WWC Rating Criteria on p. 34. 

Along a percentile distribution of individuals, the improvement index represents the gain 
or loss of the average individual due to the intervention. As the average individual starts at 
the 50th percentile, the measure ranges from -50 to +50. 

An educational program, product, practice, or policy aimed at improving student outcomes. 

A summary of the findings of the highest-quality research on a given program, product, 
practice, or policy in education. The WWC searches for all research studies on an interven- 
tion, reviews each against design standards, and summarizes the findings of those that 
meet WWC design standards. 

When a study includes multiple outcomes or comparison groups, the WWC will adjust 
the statistical significance to account for the multiple comparisons, if necessary. 

A quasi-experimental design (QED) is a research design in which study participants are 
assigned to intervention and comparison groups through a process that is not random. 

A randomized controlled trial (RCT) is an experiment in which eligible study participants are 
randomly assigned to intervention and comparison groups. 

The WWC rates the effects of an intervention in each domain based on the quality of the 
research design and the magnitude, statistical significance, and consistency in findings. The 
criteria for the ratings of effectiveness are given in the WWC Rating Criteria on p. 34. 

A research approach in which an outcome variable is measured repeatedly within and 
across different conditions that are defined by the presence or absence of an intervention. 
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Glossary of Terms 


Standard deviation The standard deviation of a measure shows how much variation exists across observations 
in the sample. A low standard deviation indicates that the observations in the sample tend 
to be very close to the mean; a high standard deviation indicates that the observations in 
the sample tend to be spread out over a large range of values. 

Statistical significance Statistical significance is the probability that the difference between groups is a result of 

chance rather than a real difference between the groups. The WWC labels a finding statistically 
significant if the likelihood that the difference is due to chance is less than 5% (p < .05). 


Substantively important a substantively important finding is one that has an effect size of 0.25 or greater, regardless 

of statistical significance. 

Systematic review a review of existing literature on a topic that is identified and reviewed using explicit meth- 
ods. A WWC systematic review has five steps: 1) developing a review protocol; 2) searching 
the literature; 3) reviewing studies, including screening studies for eligibility, reviewing the 
methodological quality of each study, and reporting on high quality studies and their find- 
ings; 4) combining findings within and across studies; and, 5) summarizing the review. 


Please see the WWC Procedures and Standards Handbook (version 3.0) for additional details. 
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Intervention 

Report 



Practice 

Guide 



Quick 

Review 


Single Study 
Review 



An intervention report summarizes the findings of high-quality research on a given program, practice, or policy in 
education. The WWC searches for all research studies on an intervention, reviews each against evidence standards, 
and summarizes the findings of those that meet standards. 


This intervention report was prepared for the WWC by Mathematica Policy Research under contract ED-IES-13-C-0010. 
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